Search CORE

1,466 research outputs found

Predicting protein disorder by analyzing amino acid sequence

Author: Yang Jack Y
Yang Mary Qu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation. Results Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity). Conclusion We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins.</p

Crossref

Directory of Open Access Journals

PubMed Central

Diversity of core promoter elements comprising human bidirectional promoters

Author: Elnitski Laura L
Yang Mary Qu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Bidirectional promoters lie between adjacent genes, which are transcribed from opposite strands of DNA. The functional mechanisms underlying the activation of bidirectional promoters are currently uncharacterised. To define the core promoter elements of bidirectional promoters in human, we mapped motifs for TATA, INR, BRE, DPE, INR, as well as CpG-islands. Results We found a consistently high correspondence between C+G content, CpG-island presence and an average expression level increasing the median level for all genes in bidirectional promoters. These CpG-rich promoters showed discrete initiation patterns rather than broad regions of transcription initiation, as are typically seen for CpG-island promoters. CpG-islands encompass both TSSs within bidirectional promoters, providing an explanation for the symmetrical co-expression patterns of many of these genes. In contrast, TATA motifs appear to be asymmetrically positioned at one TSS or the other. Conclusion Our findings demonstrate that bidirectional promoters utilize a variety of core promoter elements to initiate transcription. CpG-islands dominate the regulatory landscape of this group of promoters.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Prediction-based approaches to characterize bidirectional promoters in the mammalian genome

Author: Elnitski Laura L
Yang Mary Qu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Machine learning approaches are emerging as a way to discriminate various classes of functional elements. Previous attempts to create Regulatory Potential (RP) scores to discriminate functional DNA from nonfunctional DNA included using Markov models trained to identify sequences from promoters and enhancers from ancestral repeats. We proposed that knowledge gleaned from those methods could be further refined using a multiple class predictor to separate classes of promoter elements from enhancers or nonfunctional DNA. Results We extended our previous work, which identified over 5,000 candidate bidirectional promoters in the human genome, to map the orthologous promoter regions in the mouse genome. Our algorithm measured the robustness of evidence provided by the spliced EST annotations and incorporated evidence from annotations of UCSC Known Genes and GenBank mRNA. In preparation for de novo prediction of this promoter type, we examined characteristic features of the dataset as a whole. For instance, bidirectional promoters score very highly among all functional elements for Regulatory Potential Scores. This result was unexpected due to the limited sequence conservation found in these noncoding regions. We demonstrate that bidirectional promoters can be classified apart from other genomic features including non-bidirectional promoters, i.e. those promoters having no nearby upstream genes. Furthermore bidirectional promoters consistently score at the level of very highly conserved functional elements in the genome- developmental enhancers. The high scores are due to sequence-based characteristics within the promoters, not the surrounding exons. These results indicate that high-scoring RP regions can be deconvoluted into various functional classes of genomic elements. Using a multiple class predictor we are able to discriminate bidirectional promoters from enhancers, non-bidirectional promoters, and non-promoter regions on the basis of RP scores and CpG islands. Conclusions We examine orthology at bidirectional promoters, use discriminatory machine learning approaches to differentiate multiple types of promoters from other functional and nonfunctional features in the genome and begin the process of deconvoluting classes of functional regions that score well with RP scores. These types of approaches precede supervised learning techniques to discover unannotated promoter regions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Comparative analyses of bidirectional promoters in vertebrates

Author: Elnitski Laura
Taylor James
Yang Mary Qu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Orthologous genes with deep phylogenetic histories are likely to retain similar regulatory features. In this report we utilize orthology assignments for pairs of genes co-regulated by bidirectional promoters to map the ancestral history of the promoter regions. Results Our mapping of bidirectional promoters from humans to fish shows that many such promoters emerged after the divergence of chickens and fish. Furthermore, annotations of promoters in deep phylogenies enable detection of missing data or assembly problems present in higher vertebrates. The functional importance of bidirectional promoters is indicated by selective pressure to maintain the arrangement of genes regulated by the promoter over long evolutionary time spans. Characteristics unique to bidirectional promoters are further elucidated using a technique for unsupervised classification, known as ESPERR. Conclusion Results of these analyses will aid in our understanding of the evolution of bidirectional promoters, including whether the regulation of two genes evolved as a consequence of their proximity or if function dictated their co-regulation.</p

Crossref

Directory of Open Access Journals

PubMed Central

Prediction of DNA-binding residues from protein sequence information using random forests

Author: Jack Y Yang
Liangjiang Wang
Mary Qu Yang
Wang Liangjiang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Springer

Springer - Publisher Connector

PubMed Central

Analyzing adjuvant radiotherapy suggests a non monotonic radio-sensitivity over tumor volumes

Author: Deng Youping
Niemierko Andrzej
Yang Jack Y
Yang Mary Qu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: Adjuvant Radiotherapy (RT) after surgical removal of tumors proved beneficial in long-term tumor control and treatment planning. For many years, it has been well concluded that radio-sensitivities of tumors upon radiotherapy decrease according to the sizes of tumors and RT models based on Poisson statistics have been used extensively to validate clinical data. Results: We found that Poisson statistics on RT is actually derived from bacterial cells despite of many validations from clinical data. However cancerous cells do have abnormal cellular communications and use chemical messengers to signal both surrounding normal and cancerous cells to develop new blood vessels and to invade, to metastasis and to overcome intercellular spatial confinements in general. We therefore investigated the cell killing effects on adjuvant RT and found that radio-sensitivity is actually not a monotonic function of volume as it was believed before. We present detailed analysis and explanation to justify above statement. Based on EUD, we present an equivalent radio-sensitivity model. Conclusion: We conclude that radio sensitivity is a sophisticated function over tumor volumes, since tumor responses upon radio therapy also depend on cellular communications

Aquila Digital Community

Crossref

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

Analyzing Adjuvant Radiotherapy Suggests a Non Monotonic Radio-Sensitivity Over Tumor Volumes

Author: Deng Youping
Niemierko Andrzej
Yang Jack Y.
Yang Mary Qu
Publication venue: The Aquila Digital Community
Publication date: 01/01/2007
Field of study

Aquila Digital Community

Genomics, Molecular Imaging, Bioinformatics, and Bio-Nano-Info Integration are Synergistic Components of Translational Medicine and Personalized Healthcare Research

Author: Arabnia Hamid R.
Deng Youping
Yang Jack Y.
Yang Mary Qu
Publication venue: The Aquila Digital Community
Publication date: 01/01/2007
Field of study

Supported by National Science Foundation (NSF), International Society of Intelligent Biological Medicine (ISIBM), International Journal of Computational Biology and Drug Design and International Journal of Functional Informatics and Personalized Medicine, IEEE 7th Bioinformatics and Bioengineering attracted more than 600 papers and 500 researchers and medical doctors. It was the only synergistic inter/multidisciplinary IEEE conference with 24 Keynote Lectures, 7 Tutorials, 5 Cutting-Edge Research Workshops and 32 Scientific Sessions including 11 Special Research Interest Sessions that were designed dynamically at Harvard in response to the current research trends and advances. The committee was very grateful for the IEEE Plenary Keynote Lectures given by: Dr. A. Keith Dunker (Indiana), Dr. Jun Liu (Harvard), Dr. Brian Athey (Michigan), Dr. Mark Borodovsky (Georgia Tech and President of ISIBM), Dr. Hamid Arabnia (Georgia and Vice-President of ISIBM), Dr. Ruzena Bajcsy (Berkeley and Member of United States National Academy of Engineering and Member of United States Institute of Medicine of the National Academies), Dr. Mary Yang (United States National Institutes of Health and Oak Ridge, DOE), Dr. Chih-Ming Ho (UCLA and Member of United States National Academy of Engineering and Academician of Academia Sinica), Dr. Andy Baxevanis (United States National Institutes of Health), Dr. Arif Ghafoor (Purdue), Dr. John Quackenbush (Harvard), Dr. Eric Jakobsson (UIUC), Dr. Vladimir Uversky (Indiana), Dr. Laura Elnitski (United States National Institutes of Health) and other world-class scientific leaders. The Harvard meeting was a large academic event 100% full-sponsored by IEEE financially and academically. After a rigorous peer-review process, the committee selected 27 high-quality research papers from 600 submissions. The committee is grateful for contributions from keynote speakers Dr. Russ Altman (IEEE BIBM conference keynote lecturer on combining simulation and machine learning to recognize function in 4D), Dr. Mary Qu Yang (IEEE BIBM workshop keynote lecturer on new initiatives of detecting microscopic disease using machine learning and molecular biology, http://ieeexplore.ieee.org/servlet/opac? punumber=4425386) and Dr. Jack Y.Yang (IEEE BIBM workshop keynote lecturer on data mining and knowledge discovery in translational medicine) from the first IEEE Computer Society BioInformatics and BioMedicine (IEEE BIBM) international conference and workshops, November 2- 4, 2007, Silicon Valley, California, USA

Aquila Digital Community

PubMed Central

Supervised Learning Method for the Prediction of Subcellular Localization of Proteins Using Amino Acid and Amino Acid Pair Composition

Author: Deng Youping
Habib Tanwir
Yang Jack Y.
Yang Mary Qu
Zhang Chaoyang
Publication venue: The Aquila Digital Community
Publication date: 01/01/2007
Field of study

Background Occurrence of protein in the cell is an important step in understanding its function. It is highly desirable to predict a protein\u27s subcellular locations automatically from its sequence. Most studied methods for prediction of subcellular localization of proteins are signal peptides, the location by sequence homology, and the correlation between the total amino acid compositions of proteins. Taking amino-acid composition and amino acid pair composition into consideration helps improving the prediction accuracy. Results We constructed a dataset of protein sequences from SWISS-PROT database and segmented them into 12 classes based on their subcellular locations. SVM modules were trained to predict the subcellular location based on amino acid composition and amino acid pair composition. Results were calculated after 10-fold cross validation. Radial Basis Function (RBF) outperformed polynomial and linear kernel functions. Total prediction accuracy reached to 71.8% for amino acid composition and 77.0% for amino acid pair composition. In order to observe the impact of number of subcellular locations we constructed two more datasets of nine and five subcellular locations. Total accuracy was further improved to 79.9% and 85.66%. Conclusions A new SVM based approach is presented based on amino acid and amino acid pair composition. Result shows that data simulation and taking more protein features into consideration improves the accuracy to a great extent. It was also noticed that the data set needs to be crafted to take account of the distribution of data in all the classes

Aquila Digital Community

Springer - Publisher Connector

PubMed Central

Recommended from our members

Investigation of transmembrane proteins using a computational approach

Author: Deng Youping
Dunker A Keith
Huang Xudong
Yang Jack Y
Yang Mary Qu
Publication venue: BioMed Central
Publication date: 20/03/2008
Field of study

Background: An important subfamily of membrane proteins are the transmembrane α-helical proteins, in which the membrane-spanning regions are made up of α-helices. Given the obvious biological and medical significance of these proteins, it is of tremendous practical importance to identify the location of transmembrane segments. The difficulty of inferring the secondary or tertiary structure of transmembrane proteins using experimental techniques has led to a surge of interest in applying techniques from machine learning and bioinformatics to infer secondary structure from primary structure in these proteins. We are therefore interested in determining which physicochemical properties are most useful for discriminating transmembrane segments from non-transmembrane segments in transmembrane proteins, and for discriminating intrinsically unstructured segments from intrinsically structured segments in transmembrane proteins, and in using the results of these investigations to develop classifiers to identify transmembrane segments in transmembrane proteins. Results: We determined that the most useful properties for discriminating transmembrane segments from non-transmembrane segments and for discriminating intrinsically unstructured segments from intrinsically structured segments in transmembrane proteins were hydropathy, polarity, and flexibility, and used the results of this analysis to construct classifiers to discriminate transmembrane segments from non-transmembrane segments using four classification techniques: two variants of the Self-Organizing Global Ranking algorithm, a decision tree algorithm, and a support vector machine algorithm. All four techniques exhibited good performance, with out-of-sample accuracies of approximately 75%. Conclusions: Several interesting observations emerged from our study: intrinsically unstructured segments and transmembrane segments tend to have opposite properties; transmembrane proteins appear to be much richer in intrinsically unstructured segments than other proteins; and, in approximately 70% of transmembrane proteins that contain intrinsically unstructured segments, the intrinsically unstructured segments are close to transmembrane segments

Harvard University - DASH

Springer - Publisher Connector

PubMed Central